What Is Claimed Is: 



1 1 . A method for enhancing reliability, availability and serviceability 

2 in a computer system by replacing a signal from a failed sensor with an estimated 

3 signal derived from correlations with other instrumentation signals in the 

4 computer system, comprising: 

5 determining whether a sensor has failed in the computer system; and 

6 if the sensor has failed, using an estimated signal for the failed sensor in 

7 place of the actual signal from the failed sensor during subsequent operation of the 

8 computer system, whereby the computer system can continue operating without 

9 the failed sensor; 

10 wherein the estimated signal is derived from correlations with other 

1 1 instrumentation signals in the computer system. 



1 2. The method of claim 1 , wherein determining whether the sensor 

2 has failed involves: 

3 deriving an estimated signal for a sensor from correlations with other 

4 instrumentation signals in the computer system; and 

5 comparing a signal from the sensor with the estimated signal to determine 

6 whether the sensor has failed. 

1 3. The method of claim 2, wherein comparing the signal from the 

2 sensor with the estimated signal involves using sequential detection methods to 

3 detect changes in the relationship between the signal from the failed sensor and 

4 the estimated signal. 
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1 4. The method of claim 3, wherein the sequential detection methods 

2 include the Sequential Probability Ratio Test (SPRT). 

1 5 . The method of claim 1 , wherein prior to determining whether the 

2 sensor has failed, the method further comprises determining correlations between 

3 instrumentation signals in the computer system, whereby the correlations can 

4 subsequently be used to generate estimated signals. 

1 6. The method of claim 5, wherein determining the correlations 

2 involves using a non-linear, non-parametric regression technique to determine the 

3 correlations. 

1 7. The method of claim 6, wherein the non-linear, non-parametric 

2 regression technique can include a multivariate state estimation technique. 

1 8. The method of claim 5, wherein determining the correlations can 

2 involve using a neural network to determine the correlations. 

1 9. The method of claim 1 , wherein the instrumentation signals can 

2 include: 

3 signals associated with internal performance parameters maintained by 

4 software within the computer system; 

5 signals associated with physical performance parameters measured 

6 through sensors within the computer system; and 

7 signals associated with canary performance parameters for synthetic user 

8 transactions, which are periodically generated for the purpose of measuring 

9 quality of service from an end user's perspective. 
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10. The method of claim 1, wherein the failed sensor can be a sensor 
that has totally failed, or a sensor with degraded performance. 



1 1 1 . A computer-readable storage medium storing instructions that 

2 when executed by a computer cause the computer to perform a method for 

3 enhancing reliability, availability and serviceability in a computer system by 

4 replacing a signal from a failed sensor with an estimated signal derived from 

5 correlations with other instrumentation signals in the computer system, the 

6 method comprising: 



7 determining whether a sensor has failed in the computer system; and 

8 if the sensor has failed, using an estimated signal for the failed sensor in 

9 place of the actual signal from the failed sensor during subsequent operation of the 

10 computer system, whereby the computer system can continue operating without 

1 1 the failed sensor; 

1 2 wherein the estimated signal is derived from correlations with other 

1 3 instrumentation signals in the computer system. 

1 12. The computer-readable storage medium of claim 1 1 , wherein 

2 determining whether the sensor has failed involves: 

3 deriving an estimated signal for a sensor from correlations with other 

4 instrumentation signals in the computer system; and 

5 comparing a signal from the sensor with the estimated signal to determine 

6 whether the sensor has failed. 

1 13. The computer-readable storage medium of claim 12, wherein 

2 comparing the signal from the sensor with the estimated signal involves using 
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sequential detection methods to detect changes in the relationship between the 
signal from the failed sensor and the estimated signal. 



1 14. The computer-readable storage medium of claim 13, wherein the 

2 sequential detection methods include the Sequential Probability Ratio Test 

3 (SPRT). 

1 15. The computer-readable storage medium of claim 1 1 , wherein prior 

2 to determining whether the sensor has failed, the method further comprises 

3 determining correlations between instrumentation signals in the computer system, 

4 whereby the correlations can subsequently be used to generate estimated signals. 

1 16. The computer-readable storage medium of claim 15, wherein 

2 determining the correlations involves using a non-linear, non-parametric 

3 regression technique to determine the correlations. 

1 17. The computer-readable storage medium of claim 16, wherein the 

2 non-linear, non-parametric regression technique can include a multivariate state 

3 estimation technique. 

1 18. The computer-readable storage medium of claim 1 5, wherein 

2 determining the correlations can involve using a neural network to determine the 

3 correlations. 

1 19. The computer-readable storage medium of claim 1 1 , wherein the 

2 instrumentation signals can include: 
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3 signals associated with internal performance parameters maintained by 

4 software within the computer system; 

5 signals associated with physical performance parameters measured 

6 through sensors within the computer system; and 

7 signals associated with canary performance parameters for synthetic user 

8 transactions, which are periodically generated for the purpose of measuring 

9 quality of service from an end user's perspective. 

1 20. The computer-readable storage medium of claim 1 1 , wherein the 

2 failed sensor can be a sensor that has totally failed, or a sensor with degraded 

3 performance. 

1 2 1 . An apparatus that enhances reliability, availability and 

2 serviceability in a computer system by replacing a signal from a failed sensor with 

3 an estimated signal derived from other instrumentation signals correlations with in 

4 the computer system, comprising: 

5 a failure determination mechanism configured to determine whether a 

6 sensor has failed in the computer system; and 

7 a sensor replacement mechanism, wherein if the sensor has failed, the 

8 sensor replacement mechanism is configured to, use an estimated signal for the 

9 failed sensor in place of the actual signal from the failed sensor during subsequent 

10 operation of the computer system, whereby the computer system can continue 

1 1 operating without the failed sensor; 

12 wherein the estimated signal is derived from correlations with other 

1 3 instrumentation signals in the computer system. 
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1 22 . The apparatus of claim 2 1 , wherein the failure determination 

2 mechanism is configured to: 

3 derive an estimated signal for a sensor from correlations with other 

4 instrumentation signals in the computer system; and to 

5 compare a signal from the sensor with the estimated signal to determine 

6 whether the sensor has failed. 

1 23. The apparatus of claim 22, wherein while comparing the signal 

2 from the sensor with the estimated signal, the failure detection mechanism is 

3 configure to use sequential detection methods to detect changes in the relationship 

4 between the signal from the failed sensor and the estimated signal. 

1 24. The apparatus of claim 23, wherein the sequential detection 

2 methods include the Sequential Probability Ratio Test (SPRT). 

1 25. The apparatus of claim 21 , further comprising a correlation 

2 determination mechanism, which is configured to determine correlations between 

3 instrumentation signals in the computer system, whereby the correlations can 

4 subsequently be used to generate estimated signals. 

1 26. The apparatus of claim 25, wherein the correlation determination 

2 mechanism is configured to use a non-linear, non-parametric regression technique 

3 to determine the correlations. 

1 27. The apparatus of claim 26, wherein the non-linear, non-parametric 

2 regression technique can include a multivariate state estimation technique. 
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1 28. The apparatus of claim 25, wherein the correlation determination 

2 mechanism is configured to use a neural network to determine the correlations. 

1 29. The apparatus of claim 2 1 , wherein the instrumentation signals can 

2 include: 

3 signals associated with internal performance parameters maintained by 

4 software within the computer system; 

5 signals associated with physical performance parameters measured 

6 through sensors within the computer system; and 

7 signals associated with canary performance parameters for synthetic user 

8 transactions, which are periodically generated for the purpose of measuring 

9 quality of service from an end user's perspective. 

1 30. The apparatus of claim 2 1 , wherein the failed sensor can be a 

2 sensor that has totally failed, or a sensor with degraded performance. 
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